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Abstract 

The paper presents a multiplicative bias reduction estimator for 
nonparametric regression. The approach consists to apply a multi- 
plicative bias correction to an over-smooth pilot estimator. We study 
the asymptotic properties of the resulting estimate and prove that this 
estimate has zero asymptotic bias and the same asymptotic variance 
as the local linear estimate. Simulations show that our asymptotic 
results are available for modest sample sizes. We also illustrate the 
benefit of this new method on nuclear energy spectrum estimation. 

Index terms: Nonparametric regression, bias reduction, local linear 
estimate. 
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1 Introduction 



The decay of radioactive isotopes often generates gamma particles whose en- 
ergy can be measured using speciahzed detectors. Typically, these detectors 
count the number of particle in various energy bins over short time intervals 
such as one to ten minutes. This enables estimation of the energy distribu- 
tion of the emitted particles, which is called the energy spectrum. For low or 
medium resolution detectors, the spectrum is typically composed of multiple 
broad peaks whose location and area characterize the radio-isotope. 

Because the actual bin counts are noisy, and the energy spectrum is fairly 
smooth, it has been proposed to estimate the energy s pectrum using no n- 
parametric smoothing techniques (jSullivan et al.l 20061 ]. iGang et al.l 2004l |). 
However, it is known that many classical smoothers, such as kernel-based 
regression smoothers, /c-nearest neighbors, and smoothing splines, typically 
under-estimate in the pe aks and over-est i mate in the valleys of the regression 
uncti o n. See for ex ample ISimonofla 1996| , iFan and Gijbelsl 1996| . 



Wand and Jones 



1994 



Scott 



1992|. 



This bias degrades isotope identificatio n performance 



includes peak area or ratios of (ICasson et al.l 



or an y algorithm that 



2006| ) and motivates 



studying methods to reduce bias at peaks and valleys. There are many 
approaches to reducing the bias, but most of them do so at the cost of an 
increase in the variance of the estimator. For example, one may chose to 
under-smooth the energy spectrum. Under-smoothing will reduce the bias 
but will have a tendency of generating spurious peaks. One can also use 
higher order smoother, such as local polynomial smoother with a polynomial 
of order larger than one. While again this will lead to a smaller bias, the 
smoother will have a larger variance. Another approach is to start with a pilot 
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smoother and to estimate its b i as by smoothing the residuals (jCornillon et al. 



2009j . 



Di Marzio and TayL 



20081]). Subtracting the estimated bias from 



the smoother produces a regression smoother with smaller bias and larger 
variance. In the context of estimating an energy spectrum, the additive bias 
correction and the higher order smoothers have the unfortunate side effect 
of possibly generating a non-positive estimate. 

An attractive alternative to the linear bias correction is the multiplicative 



bias correction pioneered by 



Linton and Nielsen 



199J]. Because the multi 



plicative correction does not alter the sign of the regression function, this type 
of correcti on is particu l arly w ell suited for adjusting non- negative regression 



functions. 



Jones et al. 



1995l | showed that if the true regression function has 



four continuous derivatives, then the multiplicative bias reduction is opera- 
tionally equivalent to using an order four kernel. And while this does remove 
the bias, it also increases the variance. 

Although the bias-variance tradeoff for nonparametric smoothers is always 
present in finite samples, it is possible to construct smoothers whose asymp- 
totic bias converges to zero while ke eping the same asymptotic variance. 



Hengartner and Matzner-L0berl 2009 ] has exhibited a nonparametric den- 



sity estimator based on multiplicative bias correction with that property, 
and have shown in simulations that their estimator also enjoyed good finite 
sample properties. In this paper, we present such an estimator for nonpara- 
metric regr ession. We emphas ize that a major difference between our work 



and that of 



Jones et al. 



1995l | is that we do not assume that the regression 



function has four continuous derivatives. 

This paper is organized as follows. Section [2] introduces the notation and de- 
fines the estimator. Section [3] gives the asymptotic behavior of the proposed 
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estimate. A brief simulation study on finite sample comparison is presented 
in section m Finally, in section \5[ the procedure is applied to estimate the 
energy spectrum. The interested reader is referred to the Appendix where 
we have gathered the technical proofs. 



2 Preliminaries 
2.1 Notation. 

Let {Xi, Yi), . . . , {Xn, Yn) be n independent copies of the pair of random vari- 
ables {X, Y). We suppose that the explanatory variable X G M has probabil- 
ity density / and model the dependence of the univariate response variable Y 
to the explanatory variable X through the nonparametric regression model 

Y = m{X)+e. (1) 

We assume that the regression function m(-) is smooth and that the dis- 
turbance £ is a mean zero random variable with finite variance that is 
independent of the covariate X. Consider the linear smoothers for the re- 
gression function m{x) which we can write as 

n 

rh{x) = '^^ujj(x; h)Yj, 
i=i 

where the weight function ujj{x]h) depends on a tuning parameter h, that 
we think of as the bandwidth. 

If the weight functions are such that ^^=1 ujj{x; h) = 1 and Yl]=i h)"^ = 
{nh)~^T'^, and if the disturbances satisfy the Lindberg-Feller condition, then 
the linear smoother obeys the central limit theorem 

Vnh ^m(x) - ^ Wj{x; h)m{Xj)j — ^ 7V(0, r^). (2) 
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We can use to construct asymptotic pointwise confidence intervals for tlie 
unknown regression function m(x). But unless the limit of the scaled bias 



b{x) = lim Vnh ( Wj{x; h)m{Xj) — m{x) 
which we call the asymptotic bias, is zero, the confidence interval 



m{x) — Z i_a/2 V 'nhT,m{x) + Zi_a/2y nhr 

will not cover asymptotically the true regression function m{x) at the nominal 
1 — a level. The construction of valid pointwise 1 — a confidence intervals 
for regression smoothers is the another motivation for developing estimators 
with zero asymptotic bias. 

2.2 Multiplicative bias reduction 

Here we present a framework for multiplicative bias reduction. Given a pilot 
smoother 

n 

iTT-nix) = ^UJj{x; ho)Yj, 
3=1 

the ratio 

V- = - ^ 

is a noisy estimate of m{Xj)/mn{Xj), the inverse relative estimation error of 
the smoother rfi„ at each of the observations. Smoothing Vj by 

n 

an{x) = ^ujj{x; hi)Vj 

yields an estimate for the inverse of the relative estimation error which can 
be used as a multiplicative correction of the pilot smoother. This leads to 
the (nonlinear) smoother 

fhnix) = a„(x)m„(x). (3) 



The estimator 1^ was studied for 
and further studied by IJones et al. 



ixed d esign by 



Linton and Nielsen 



1994 | 



1995|. In both cases, they assumed that 



the regression function had four continuous derivatives, an d show an im 
provement in the convergence rate of the corrected estimator. 



Glad 



1998 



proposes to use a parametrically guided local linear smoother and Nadaraya- 
Watson smoother by starting with a parametric pilot. She shows that the 
resulting estimates improves on the local polynomial estimate as soon as the 
pilot captures some of the features of the regression function. 



3 Theoretical Analysis of Multiplicative Bias 
Reduction 

In this section, we will show that the multiplicative smoother has smaller 
bias with essentially no cost to the variance, assuming only two derivatives 
of the regression function. While the derivation of our results are for local 
linear smoothers, the technique used in the proofs can be easily adapted for 
other linear smoothers, and the conclusions remain essentially unchanged. 

3.1 Assumptions 

We make the following assumptions: 

1. The regression function is bounded and strictly positive, that is, b > 
m{x) > a > for all x. 

2. The regression function is twice continuously different iable everywhere. 

3. The density of the covariate is strictly positive on the interior of its 
support in the sense that /(x) > 6(/C) > over every compact /C 
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contained in the support of /. 



4. e has finite fourth moments and has a symmetric distribution around 
zero. 

5. Given a symmetric probabihty density K{-), consider the weights ujj{x; h) 
associated to the local linear smoother. That is, denote by Kh{-) = 
K{-/h)/h the scaled kernel by the bandwidth h and define for k = 
0, 1, 2, 3 the sums 

n 

Sk{x) = Skix; h) = - xfKu{X, - x). 

i=i 

Then 

u-(x- h) = S2ix;h)-iX,-x)Siix;h) _ 

We set 

uJoj{x) = ujj{x] ho) and uJij{x) = ujj{x] hi). 

6. The bandwidths ho and hi are such that 

ho — > 0, hi — > nho — > oo, nhl — ^ ^ 7~" — ^ 0- 

ho 

3.2 A technical aside 

The proof of Theorems (13. ip and (13.21) rests on establishing a stochastic 
approximation of estimator ([3]) in which each term can be directly analyzed. 

Proposition 3.1. We have 

n n n 

fhn{x) = Unix) + '^UJij{x)Aj{x) + UJij{x) Bj (x) + '^Uij{x)^j, 

j=l j=l j=l 
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where ^n{x), conditionally on Xi, . . . ,X„ is a deterministic function, Aj, Bj 
and are random variables. Under condition nh^ — > oo, the remainder S,j 
converges to in probability and we have 

n n 
rhnix) = Hn{x) + '^UJij{x)Aj{x) + '^UJij{x)Bj{x) + Op 

j=i i=i 
Remark: A technical difficulty arises because even though may be small 
in probability, its expectation may not be small. We resolve this problem by 
showing that only needs to modify C,j on a set of vanishingly small probability 
to guarantee that its expectation is also small. 

Definition Given a sequence of real numbers a„, say that a sequence of 
random variables C,n = Op(a„) if for all fixed t > 0, 

limsupP[|^„| > tan] = 0. 

n — >oo 

We will need the following Lemma. 

Lemma 3.1. = Op(a„), then there exists a sequence of random variables 
^* such that 

limsupP[C7^^„] = and E[C] = o(a„). 

n >oo 

We shall use the following notation 




3.3 Main results 

We deduce from Proposition 13.11 and Lemma 13.11 the following Theorem. 
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Theorem 3.1. Under the assumptions (l)-(6), the estimator fhn satisfies: 

{fhn{x)\Xi,...,Xn) =/i„(a:) + 0p ( —^=] + Op ( 

\ny/hohiJ \nhoJ 

and 

If the bandwidth Hq of the pilot estimator converges to zero much slower than 
hi, then m„ has the same asymptotic variance as the local linear smoother 
of the original data with bandwidth hi. However, for finite samples, the two 
step local linear smoother can have a slightly larger variance depending on 
the choice of /iq- A limited Taylor expansion of fin{x) leads to the following 
result. 

Theorem 3.2. Under the assumptions (l)-(6), the estimator rhn satisfies: 

(m„(x)|Xi, . . .,Xn) = m{x) + OpQil). 

Combining Theorem 13.11 and Theorem 13. 2[ we conclude that the multiplica- 
tive adjustment performs a bias reduction on the pilot estimator without 
increasing the asymptotic variance. The asymptotic behavior of the band- 
widths h^ and hi is constrained by assumption 6. However, it is easily seen 
that this assumption is satisfied for a large set of values of /iq and hi. For 
example, the choice hi = cin~^^^ and ho = cqti'"' for < a < 1/5 leads to 

E^ {mnix)\Xi, . . . , Xn) -m{x) = Op{n-^/^) 

and 

V,(m„(x)|Xi,...,X„) = Op 
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4 Numerical examples 



While the amount of the bias reduction depends on the curvature of the 
regression function, a decrease is expected (asymptotically) everywhere, and 
this, at no cost to the variance. The simulation study in this section shows 
that this asymptotic behavior emerges already at modest sample sizes. 

4.1 Local study 

To illustrate numerically the possible reduction in the bias and associate in- 
crease of the variance achieved by the multiplicative bias correction, consider 
estimating the regression function 

m{x) = 3 + 3|x|^/^ + + 4 cos(lOx) 

at X = (see Figure [T]). 




\ \ \ \ 

-1.0 -0.5 0.0 0.5 1.0 

Figure 1: The regression function to be estimated. 
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The local linear smoother tends to under-estimate the regression function at 
their maximum, and hence, this example will provide a good example. Fur- 
thermore, because the second derivative of this regression function is contin- 
uous but not differentiab le at the origin, the results previously obtained by 



Linton and Nielsen 



1994( 1 do not apply. 
The data are simulated according to the model 

Y, = m{Xi) + e,, i = l,...,100, 

where Ei are independent jV(0,0.1^) variables. We first consider the local 
linear estimate with a Gaussian kernel function and we study its performances 
over a grid of bandwidths Ti. = [0.005,0.040]. For the new estimate, the 
theory recommends to start with an over-smooth pilot estimate. In this 
regard, we take = 0.03 and study the performance of the multiplicative 
bias corrected estimate for hi E Hi = [0.005,0.060]. In order to explore 
the sensitivity of our two stages estimator on ho, we also consider the choice 
ho = 0.008. For such a choice, the pilot estimate clearly under-smoothes the 
regression function. 

Bias and the variance of each estimate are calculated at x = 0. To do this, 
we compute the value of each estimate at x = for 200 samples (Xj, Yi),i = 
1, . . . , 100. The same design Xi, i = 1, . . . , 100 is used for each sample. It is 
generated according to a uniform distribution over [—1, 1]. The bias at point 
X = is estimated by subtracting m(0) at the mean value of the estimate at 
X = (the mean value is computed over the 200 replications). Similarly we 
estimate the variance at x = by the variance of the values of the estimate 
at this point. Figure [2] represents squared bias, variance and mean square 
error of each estimate for different values of bandwidth h for the local linear 
smoother and hi for our estimate. 



11 



Figure 2: Mean square error (dotted line), squared bias (solid line) and 
variance (dashed line) of the local linear estimate (left) and multiplicative 
bias corrected estimate with ho = 0.03 (center) and ho = 0.008 (right) at 
point X = 0. 

The first conclusion is that the corrected estimate has smaller bias than the 
local linear estimate provided the pilot estimate over-smoothes the regression 
function. Small values of ho clearly under-smooth the regression function, 
whatever the choice of hi. Moreover, it is worth pointing out that our pro- 
cedure does not significantly increase the variance. Even if Theorem 13.11 
and Theorem 13.21 provide asymptotic results, our simulations show that the 
asymptotic behavior of our estimate emerges already at modest sample size. 
Finally, due to the bias reduction, we note that our procedure also reduces 
the optimal mean square error (see Tabled]). 
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MSE 


Bias^ 


Variance 


LLE 


3.38 10-3 


0.75 10-3 


2.63 10-3 


MBCE 


2.04 10-3 


0.24 10-3 


1.80 10-3 



Table 1: Optimal mean square error (MSE) for the local linear estimate 
(LLE) and the multiplicative bias corrected estimate (MBCE) with ho = 0.03 
at point X = 0. 



4.2 Global study 



This paper does not conduct any theory to select the two bandwidths ho 
and hi in an optimal way. If automatic procedures are needed, they can 



be obtained by adjusting traditional au tomatic se 



ection procedures for the 



classical nonparametric estimators (see iBurr et al.l [2009l |). In this part, we 
propose to use leave-one-out cross validation to choose both ho and hi. We 
then compare the performance of the selected estimate with the local poly- 
nomial estimate in term of integrated square error. 



Hurvich et al. 



1998l | report a comprehensive numerical study that compares 



standard smoothing methods on various test functions. Here, we take the 
same setting to compare the local linear estimate with its multiplicative bias 
corrected smoother. In each of the examples, we take the Gaussian kernel 
K{x) = exp(— x^/2)/-\/27r. We consider the following regression functions 
(see Figure [3]): 

(1) mi{x) = sin(57rx) 

(2) m2(x) = sin(157rx) 

(3) nisix) = 1 - 48a; + 218x^ - 315x3 + U5x^ 

(4) mi{x) = 0.3 exp [-64(x - .25)^] + 0.7 exp [-256(x - .75)^] 
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and we take a Gaussian error distribution with standard deviation a = 0.3 
for 7711,7712, 771^ and a = 0.05 for 7714. 

m1(x) m2(x) 




0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 



Figure 3: Regression functions to be estimated. 

We use a cross validation device to select both ho and hi. This selection 
procedure involves solving minimization problem that necessitate a search 
over a finite grid Ti. of bandwidths h^ and hi. Formally, given Ti, we choose 
/iq and hi such as 

1 " 

{ho, hi) = argmin - ^{Yi - mlXXi)f. 
ihoM)enxn i=i 

Here stands for the corrected local polynomial estimate after deleted the 
ith observation. To assess the quality of the selected estimate, we compare 
its performances with the local polynomial estimate for which the bandwidth 
is again selected by leave-one-out cross validation. The performance of an 
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estimator m is measured by the integrated square error 

ISE{m) = / (m(x) — m(x))^ dx, 
Jo 

and to avoid the boundary effects, the design Xi, . . . ,X„ is generated ac- 
cording to a uniform distribution over [—0.2, 1.2]. 

Table [2] presents the median over 100 rephcations of 

• the selected bandwidths; 

• the integrated square error; 

• the integrated square error of the local linear estimate divided by the 
integrated square error of the corrected estimate (Rise)- 





LIE 


MBCE 






h 


ISE 


ho 


h 


ISE 


Rise 


nil 


0.029 


0.021 


0.053 


0.041 


0.017 


1.226 


m2 


0.014 


0.092 


0.026 


0.015 


0.078 


1.156 


mg 


0.029 


0.021 


0.070 


0.056 


0.012 


1.600 


777.4 


0.019 


0.0010 


0.033 


0.024 


0.0009 


1.135 



Table 2: Median over 100 replications of the selected bandwidths and of the 
integrated square error of the selected estimates. LLE and MBCE stands for 
local linear estimate and multiplicative bias corrected estimate. 

We obtain significant ISE reduction. As predicted by Theorem 13.11 the 
data-driven procedure chooses ho bigger than h: the pilot estimate is over- 
smoothing the true regression function. Of course, selecting both ho and hi 
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is time consuming and can appear as the price to be paid to improve the 
local linear smoother. 

The following picture presents, for the regression function mi with n = 100 
and 100 iterations, different estimators on a grid of points. In lines is the 
true regression function which is unknown. For every point on a fixed grid, 
we plot, side by side, the mean over 100 replications of our estimator at 
that point (left side) and on the right side of that point the mean over 100 
replications of the local polynomial estimator. Leave-one-out cross validation 
is applied to select the bandwidths and hi for our estimator and the 
bandwidth h for the local polynomial estimator. We add also the interquartile 
interval in order to see the fluctuations of the different estimators. On this 




0.2 0.3 0.4 0.5 0.6 0.7 0.8 

grillex 



Figure 4: The solid curve represents the true regression function, our esti- 
mator is in dashed line and local linear smoother is dotted. 

example, our estimator reduces the bias by increasing the peak and decreasing 
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the valley and the interquartile intervals look similar for both estimator, as 
predicted by the theory. 



5 Example: Estimation of an energy spec- 
trum 

The energy spectrum of Bal33 is measured at Los Alamos National Labo- 
ratory using a 1024-energy channel Sodium Iodide detector for a one-minute 
count time. The calibration of channel to energy is not important in this 
context so we consider the one-minute counts versus bin number. 

Figure [5] shows the raw counts versus bin number for the first 250 bins, 
the smoother histogram using a local linear smoother (dotted line) and the 
multiplicative adjusted energy spectrum (dashed line). Observe that the 
multiplicative adjusted smoother does indeed fit better the peaks and the 
valleys of the data without introducing undue variability on the rest of the 
curve. This suggests that the multiplicative adjustment prior to peak height 
and area estimation will impr ove isotope identifica tion performance. Isotope 



identification algorithms (see ICasson et al.l 20061 ]) in the broadest context 



must consider multiple unknown source isotopes, unknown form (gas, liquid, 
solid), with unknown shielding between the source and detector, which mod- 
ifies spectral shape. Many algorithms rely on peak location, height, and/or 
area so the multipli cative adjustment is an appealing data processing step. 



Sullivan et al. 



2006| report success using wavelet smoothing to locate peaks 
but do not consider the impact of smoothing on estimated peak height or 
area. 
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6 Proofs 



Figure 5: See text. 



This section is devoted the technical proofs. 

6.1 Proof of Proposition 13.11 

Write the bias corrected estimator 



m„(x) 



and let us approximate the quantity Rj{x). Define 

n 

rhn{x) = ^uJoj{x)m{Xj) = E (m„(x)|Xi, . . . 
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and observe that 

mn{Xj) 



m, 



m„(Xj) 
where 



Write now Rj{ 



where rj{x,Xj) is a random variable converging to to be define latter on. 
Given the last expression and model ([T]), estimator ([3]) could be written as 



n 



j=i J' 



n 



mn{Xj) 
rhn{x) 



n n n 

--lin{x) + ^UJij{x)Aj{x) + ^UJij{x)Bj{x) + ^UJij{x)ij. 

j=l j=l j=l 
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which is the first part of the lemma. Under assumption set forth in Section 
3.11 the pilot smoother rhn converges to the true regression function m(x). 



Bickel and Rosenblatt 



1973l | show that this convergence is uniform over com- 



pact sets /C contained in the support of the density of the covariate X. As a. 
result 



sup \mn{x) 



m„(x)| < -. 



So a limited expansion of (1 + m) ^ yields for x G /C 
thus 



[1 + A„(x) - A„(X,) + Op (|A„(x)A„(X,)| + AliX^))] 



= O,(|A„(x)A„(X,)|+A^(X,)). 
Under the stated regularity assumptions, we deduce that 

leading to the announced result. Theorem (13. ip is proved. 



6.2 Proof of lemma 

By definition 

limsupP[|,^.„| > ta.n] = 

n >oo 

for all t > 0, so that a triangular array argument shows that there exists an 
increasing sequence m = m{k) such that 



P 



< — for all n > m(k). 

k 



For m^k) < n < m{k + 1) — 1, define 

Sn 



if \^n\< k 
otherwise. 
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It follows from the construction of ^* that for n G {m{k),m{k + 1) — 1), 



which converges to zero as n goes to infinity. Finally set k{n) = sup {A; 
m{k) < 77,}, we obtain 



nm < 



k{n) 



o[ar. 



6.3 Proof of Theorem ( 1331 ) 

Recall that 



m 



n n f I \ 

n{x) = Unix) + ^UJij{x)Aj{x) + ^ UJij (x) Bj (x) + ^P(-J-]- 
j=l j=l V 0/ 



Focus on the conditional bias, we get 



E(/x„(x)|Xi,...,X„]) 
E(A,(x)|Xi,...,Xj) 
E(5,(x)|Xi,...,X„]) 



IJn[X) 



rfijx 



Thn{Xj) \mn{x) mn{Xj)J' 



Since 



n n n / 1 \ 



we deduce that 

n 

x) Xi, . . . ,X„ 

This proves the first part of the Theorem. 
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For the conditional variance, we use the following expansion of the two stages 
estimator 

m„(x) = (1 + [A.(x) - A„(X,)]) + O, . 

Using the fact that the residuals have four finite moments and have a sym- 
metric distribution around 0, a moment's thought shows that 

V(F, [A„(x) - A„(X,)] |Xi, . . . , X„) = Op (^-^ j 



and 



Hence 



Cov(r„ [A„(x) - K{X,)] . . . , X„) = Op (^-^^ . 



V.(m„(x)|Xi,...,X„) = V ^cui.i 



0=1 



Thn{Xj) 



-^1 5 • • • , Xn 



+ 0, 



nhr 



Observe that the first term on the right hand side of this equality can be 
seen as the variance of t he two stages estimator with a deterministic pilot 



estimator. It follows from I Glad 

f \ ^^^^^ V 

which proves the theorem. 



1998a I that 



-^1, • • • , 



n 
J = l 



CT" ^cur.ix) + 0„ 



^ \nhi 



6.4 Proof of theorem (13.21) 

Recall that 



i^n{x) = UJij (x) fnjXj). 



'j<n 



mn{Xj 
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We consider the limited Taylor expansion of the ratio 
then 



— x)uJij{x) 



+ 



m{x) ^ f ^ f ^{^) 

(||y)"i:W-)V.W(i + o.(i))}, 



It is easy to verify that 

n 

Eo(x;/ii) = y^u;ij(x) = 1, 

n 

Ei(x; /ii) = ^^(Xj — x)uJij{x) = 

n 

T,2{x;hi) = y^^jXj - xYuJij{x) 



S'|(x; /ii) — 5*3 (a:; hi)Si{x; hi) 
S2{x] hi)So{x; hi) - Sf{x] hi) ' 



'or ra ndom designs, we can further approximate (see, e.g.. 



Wand and Jones 



1995j) 



SJx, hi] 



h'^a'kf{x) + o,{h'^) 



for k even 



hk+^a''+^f{x) + Oj,{h^+^) for k odd. 
where = j u^K{u) du. Therefore 



so that we can write fin{x) as 



Hn{x) =mn{x) 



m{x) o'j^h'l f m{x) \ " 



Op{hl) 



--m{x) H —mn{x) , , . 

2 \mn[x) 



mix) 



+ Op{hl). 
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Expression 



m{x} \" m^{x)m" (x) mn{x)m'^{x)m' (x) 



2- 



mn[X) J "^n(^) "^nl^) 

m{x)fnn{x)fn!'J^x^ ^{x)(fn!J^xy)^ 
ml{x) ml{x) 

and applying the usual approximations, we conclude that 



mix) ■ " 



mn\x 

Putting all pieces together, we obtain 



o,(l). 



E4m„(x)|Xi, . . . - m[x) = Op{hl) + Op ( - ) + Op , — 



1 



Since 



nhl — ^ ^ T~ — * 0; 

hn 



we conclude that the bias is of order Op(/ij 





2\ 
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